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PATENT 



VECTOR ADAPTIVE PREDICTIVE 
CODER FOR SPEECH AND AUDIO 



ORIGIN OP INVENTION 

The invention described herein was made in the 
performance of work under a NASA contract , and is 
subject to the provisions of Public Law 96*517 (35 
USC 202) under which the inventors were granted a 
5 request to retain title. 

BACKGROUND OF THE INVENTION 

This invention relates a real-time coder for 
compression of digitally encoded speech or audio 
signals for transmission or storage* and more par* 
ticularly to a real-time vector adaptive predictive 

10 coding system. 

In the past few years , most research in speech 
coding has focused on bit rates from 16 kb/s down to 
150 bits/s. At the high end of this range, it is 
generally accepted that toil quality can be achieved 

15 at 16 kb/s by sophisticated waveform coders which are 
based on scalar quantization. N.S. Jayant and P. 
Noll, Digital Coding of Waveforms . Prentice-Hall 
Inc., Englewood Cliffs, N.J., 1964. At the other 
end, coders (such as linear-predictive coders) oper- 

20 atlng at 2400 blts/s or below only give synthetic- 
quality speech. For bit rates between these two 
extremes, particularly between 4.8 kb/s and 9.6 kb/s, 
neither type of coder can achieve high-quality 
speech. Part of the reason is that soalar quantiza- 

25 tlon tends to break. down at a bit rate of 1 bit/sam- 
ple. Vector quantization (VQ), through its 
theoretical optlmallty and its capability of operat- 
ing at a fraction of one bit per sample, offers the 
potential of achieving high-quality speech at 9.6 
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kb/s or even at 4.8 kb/s. J, Hakhoul, S. Roucos, and 
H. Glsh, "Vector Quantization In Speech Coding," 
Proo. IEEE, Vol. 73, No. 11, November 1985. 

Vector quantization (VQ) can achieve a per* 
5 formance arbitrarily close to the ultimate rate-dis* 
tortlon bound If the vector dimension Is large 
enough. T. Berger, Rate Distortion Theory . Prentioe- 
Hall Inc., Englewood Cliffs, N. J., 1971. However, 
only small vector dimensions can be used in practical 
10 systems due to complexity considerations, and unfor- 
tunately, direct waveform VQ using small dimensions 
does not give adequate performance. One possible way * 
to improve the performance Is to combine VQ with 
other data compression techniques which have been 
15 used successfully in scalar coding schemes. 

In speech coding below 16 kb/s, one of the 
most successful scalar coding schemes Is Adaptive 
Predictive Coding (APC) developed by Atal and 
Schroeder [B.S. Atal and M.R, Schroeder, "Adaptive 
20 Predictive Coding of Speech Signals," Bell Syst. 
• Tech. J. , Vol. 49, pp. 1973-1986, October 1970; B.S. 
Atal and M.R. Schroeder, "Predictive Coding of 
Speech Signals and Subjective Error Criteria," IEEE 
Trans. Acoust., Speech, Signal Proc, Vol. ASSP-27, 
25 No. 3, June 1979; and B.S. Atal, "Predictive Coding 
of Speech at Low Bit Rates," IEEE Trans. Comm., Vol. 
C0M-30, No. 4, April 1982].. It is the combined power 
of VQ and APC that, led to the development qt the 
present Invention, a Vector Adaptive Predictive Coder 
30 (VAPC). Such a combination of VQ and APC will pro* 
vide high-quality speech at bit rates between 4.8 and 
9.6 kb/s, thus bridging the gap between scalar coders 
and VQ coders. 
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The baalc Idea of APC is to first remove the 
redundancy In speech waveforms using adaptive linear 
predictors! and then quantize the prediction residual 
using a scalar quantizer. In VAPC, the scalar quan- 
5 tlzer In APC Is replaced by a vector quantizer VQ. 
The motivation for using VQ Is two-fold. First, 
although liner dependency between adjacent speech 
samples Is essentially removed by linear prediction, 
adjacent prediction residual samples may still have 
10 nonlinear dependency which can be exploited by VQ. 
Secondly* VQ can operate at rates below one bit per 
sample. This Is not achievable by scalar quantiza- 
tion, but It Is essential for speech coding at low 
bit rates. 

15 The vector adaptive predictive coder (VAPC) 

has evolved from APC and the vector predictive coder 
Introduced by V. Cuperman and A* Cersho, "Vector 
Predictive Coding of Speech at 16 kb/s," IEEE Trans. 
Comm., Vol. COM-33. pp. 685-696, July 1985. VAPC 

20 contains some features that are somewhat similar to 
the Code-Excited Linear Prediction (CELP) coder by 
M.R. Schroeder, B.S. Atal, "Code-Excited Linear Pre- 
diction (CELP): Hlgh-Quality Speech at Very Low Bit 
Rates," Proc. Int'l. Conf. Acoustics, Speech, Signal 

25 Proc, Tampa, March 1985, but with much less computa- 
tional complexity. 

In computer simulations, VAPC gives very good 
speech quality at 9.6 kb/s, achieving 18 dB of sig- 
nal-to-noise ratio (SNR) and 16 dB of segmental SNR. 

30 At 4.8 kb/s, VAPC also achieves reasonably good 
speech quality, and the SNR and segmental SNR are 
about 13 dB and 11.5 dB, respectively. The computa- 
tions required to achieve these results are only In 
the order of 2 to 4 million flops per second (one 
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flop, a floating point operation, is defined as one 
multiplication, one addition, plus the associated 
Indexing), well within the capability of today* s 
advanced digital signaling processor chips. VAPC may 
5 become a low-complexity alternative to CELP, which is 
known to have achieved excellent speech quality at an 
expected bit rate around 4.8 kb/s but Is not pres- 
ently capable of being implemented in real-time due 
to its astronomical complexity. It requires over 400 

10 million flops per second to implement the coder. In 
terms of the CPU time of a supercomputer CRAY-t, CELP 
requires 125 seconds of CPU time to encode one second • 
of speech. There is currently a great need for a 
real-time, high-quality speech coder operating at 

15 encoding rates ranging from 4.8 to 9.6 kb/s. In this 
range of encoding rates,, the two coders mentioned 
above (APC and CELP) are either unable to achieve 
high quality or too complex to Implement. In con- 
trast, the present invention, which combines Vector 

20 Quantization (VQ) with the advantages of both APC and 
CELP, Is able to achieve high-quality speech with 
sufficiently low complexity for real-time coalng. 

OBJECTS AND SUMMARY OF THE INVENTION 

An object of this invention is to encode in 
real time analog speech or audio waveforms into a 

25 compressed bit stream for storage and/or transmis- 
sion, and subsequent reconstruction or the waveform 
for reproduction. 

Another object is to provide adaptive post- 
filtering of a speech or audio signal that has been 

30 corrupted by noise resulting from a coding system or 
other sources of degradation so as to enhance the 
perceived quality of said speech or audio signal. 
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The objects of this invention are achieved by a system 



which approximates each vector of K speech samples by using each 
of M fixed vectors stored In a VQ codebook to excite a time- 
varying synthesis filter and picking the best synthesl2ed vector 
that minimizes a perceptually meaningful distortion measure. The 
original sampled speech Is first buffered and partitioned Into 
vectors and frames of vectors, where each frame Is partitioned 
into N vectors, each vector having K speech samples. Predictive 
analysis of pitch-filtering parameters (P) linear-predictive co- 

10 efficient filtering parameters (LPC), perceptual weighting filter 
parameters (W) and residual gain scaling factor (G) for each of 
successive frames of speech is then performed. The parameters 
determined in the analyses are quantized and reset every frame for 
processing each input vector s n in the frame, except the 
perceptual weighting parameter. A perceptual weighting filter 
responsive to the parameters W Is used to help select the VQ 
vector that minimizes the perceptual distortion between the coded 
speech and the original speech. Although not quantized, the 
perceptual weighting filter parameters are also reset every frame. 

20 After each frame Is buffered and the above analysis is 

completed at the beginning of each frame, M zero-state response 
vectors are computed and stored in a zero-state response codebook. 
These H zero-state response vectors are obtained by first setting 
to zero the memory of an LPC synthesis filter and a perceptual 
weighting filter in cascade with a scaling unit controlled by the 
factor. G, and then controlling the respective filters with the 
quantized LPC filter parameters and the unquantlzed perceptual 
weighting filter parameters, and exciting the cascaded filters 
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using one predetermined and fixed vector quantization (VQ) 
codebook vector at a time. The output vector of the cascaded 
filters for each VQ codebook vector is then stored in a temporary 
zero-state codebook at the corresponding address, i.e., is 
assigned the same index of a temporary zero-state response 
codebook as the index of the exiting vector out of the VQ 
codebook. In encoding each Input speech vector a n within a frame, 
a pitch-predicted vector s n of the vector s R is determined by 
processing the. last vector encoded as an index code through a 
scaling unit, LPC synthesis filter and pitch predictor filter 
controlled by the parameters QG, QLPC, QP and QPP for the frame. 
In addition, the zero- input response of the cascaded filters (the 
ringing from excitation of a previous vector) is first set in a 
zero- input response filter. Once the pitch-predicted vector s n is 
subtracted from the input signal vector s n> and a difference 
vector d n Is passed through the perceptual weighting filter to 
produce a filtered difference vector f R , the zero- input response 
vector in the aforesaid zero- input response filter Is subtracted 
from the output of the perceptual weight filter, namely the 
difference vector f R , and the resulting vector v R Is compared with 
each of the N stored zero-state response vectors In search of the 
one having a minimum difference & or distortion. 

The index (address) of the zero-state response vector 
that produces the smallest distortion, i.e., that is closest to 
v , identifies the best vector in the permanent VQ codebook. Its 
index (address) is transmitted as the vector compressed code for 
the vector s n , and used by a receiver which has an identical VQ 
codebook as the transmitter to find the best-match vector. In the 
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transmitter, that best-match vector Is used at the time of 
transmission of its index to excite the LPC synthesis filter and 
pitch prediction filter to generate an estimate s n of the next 
speech vector. The best-match vector is also used to excite the 
zero-input response filter to set it for the next input vector s n 
to be processed as described above. The indices of the best- 
match vectors for a frame of vectors are combined In a multiplexer 
with the frame analysis information hereinafter referred to as 
"side information, " comprised of the indices of quantized 

10 parameters which control pitch, pitch predictor and LPC predictor 
filtering and the gain used In the coding process, in order that 
it be used by the receiver in decoding the vector indices of a 
frame into vectors using a codebook Identical to the permanent VQ 
codebook at the transmit tier. This side information is preferably 
transmitted through the multiplexer first, once for each frame of 
VQ indices that follow, but it would be possible to first transmit 
a frame of vector Indices, and then transmit the side Information 
since the frames of vector indices will require some buffering in 
either casej the difference is only in some Initial delay at the 

20 beginning of speech or audio frames transmitted In succession. 

The resulting stream of multiplexed indices are transmitted over a 
communication channel to a decoder, or stored for later decoding. 

In the decoder, the bit stream is first demultiplexed to 
separate the side Information from the encoded vector indices that 
follow. Each encoded vector Index is used at the receiver to 
extract the corresponding vector from the duplicate VQ codebook. 
The extracted vector is first scaled by the gain parameter, using 
a table to convert the quantized gain index to the appropriate 
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scaling factor, and then used to excite cascaded LPC synthesis and 
pitch synthesis filters controlled by the sane side information 
used In selecting the best-match index utilizing the zero-state 
response codebook in the transmitter. The output of the pitch 
synthesis filter is the coded speech, which is perceptually close 
to the original speech. All of the side information, except the 
gain information, is used in an adaptive postf liter to enhance the 
quality of the speech synthesized. This postf liter ing technique 
may be used to enhance any voice or audio signal. All that would 
be required Is an analysis section to produce the parameters used 
to make the post filter adaptive. 

According to a broad aspect of the invention there is 
provided an Improvement in the method for compressing digitally 
encoded input speech or audio vectors at a transmitter by using a 
scaling unit controlled by a quantized residual gain factor QG, a 
synthesis filter controlled by a set of quantized linear 
protective coefficient parameters QLPC, a pitch predictor 
controlled by pitch and pitch predictor parameters QP and QPP, a 
weighting filter controlled by a set of perceptual weighting 
parameters W, and a permanent Indexed codebook containing a 
predetermined number M of codebook vectors, each having an 
assigned codebook Index, to find an index which Identifies the 
best match between an Input Bpeech or audio vector s R that is to 
be coded and a synthesized vector s n generated from a stored 
vector in said Indexed codebook, wherein each of said digitally 
encoded input vectors consists of a predetermined number K of 
digitally coded samples, comprising the steps of 

buffering and grouping Bald Input speech or audio vectors 
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into frames of vectors with a predetermined number N of vectors in 
each frame, 

performing an initial analysis for each successive frame, 
said analysis including the computation of a residual gain factor 
G, a set of perceptual weighting parameters W, a pitch parameter 
P, a pitch predictor parameter PP, and a set of said linear 
predictive coefficient parameters LPC, and the computation of 
quantized values QG f QP, QPP and QLPC of parameters G, P, PP and 
LPC using one or more indexed quantizing tables for the 
computation of each quantized parameter or set of parameters 

for each frame transmitting indices of said quantized 
parameters QG, QP, QPP and QLPC determined in the initial analysis 
step as side information about vectors analyzed for later use in 
looking up in one or more identical tables said quantized 
parameters QG, QP, QPP and QLPC while reconstructing speech and 
audio vectors from encoded vectors in a frame, where each index 
for a quantized parameter points to a location in one or more of 
said identical tables where said quantized parameter may be found, 
computing a zero-state response vector from the vector output 
of a cascaded filter comprising a scaling unit, synthesis filter 
and weighting filter identical in operation to said scaling unit, 
synthesis filter and weighting filter used for encoding said input 
vectors, said zero-state response vector being computed for each 
vector in said permanent codebook by first setting to zero the 
initial condition of said cascaded filter so that the response 
computed is not influenced by a preceding one of said codebook 
vectors processed by said cascaded filter, and then using said 
quantized values of said residual gain factor, set of linear 
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predictive coefficient parameters, and said set of perceptual 
weighting parameters computed in said initial analysis step by 
processing each vector in said permanent codebook through said 
zero-input response filter to compute a zero-state response 
vector, and storing each zero-state response vector computed In a 
zero-state response codebook at or together with an Index 
corresponding to the index of said vector In said permanent 
codebook used for this zero-state response computation step, and 
after thus performing an initial analysis of and computing a zero- 
state response codebook for each successive frame of input speech 
or audio vectors, encode each input vector s n of a frame in 
sequence by transmitting the codebook Index of the vector in said 
permanent codebook which corresponds to the index of a zero-state 
response vector in said zero-state response codebook that best 
matches a vector v fl obtained from an input vector s n by 

subtracting a long term pitch prediction vector I n from the 
input vector s n to produce a difference vector d n and filtering 
said difference vector d n by said perceptual weighting filter to 
produce a final input vector f n , where said long term pitch 
prediction § n is computed by taking a vector from said permanent 
codebook at the address specified by the preceding particular 
index transmitted as a compressed vector code and performing gain 
scaling of this vector using said quantized gain factor QG, then 
synthesis filtering the vector obtained from said scaling using 
said quantized values QLPC of said set of linear predictive 
coefficient parameters to obtain a vector 3 n and from vector 3 n 
producing a long term pitch predicted vector s n of the next Input 
vector & n through a pitch synthesis filter using said quantized 
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values of pitch predictor parameters QP and QPP, said long term 
prediction vector s n being a prediction of the next input vector 
s n # and 

producing said vector v n by subtracting from said final input 

vector f n the vector output of said zero-input response filter 

generated in response to a permanent codebook vector at the 

codebook address of the last transmitted index code, said vector 

output being generated by processing through said zero input 

response filter, said permanent codebook vector located at said 

10 last transmitted Index code where the output of said zero Input 

response filter is discarded while said permanent codebook vector 

located at said last transmitted index code Is being processed 

sample by sample in sequence into said zero input response filter 

until all samples of said codebook vector have been entered, and 

where the Input of said zero input response filter is Interrupted 

after all samples of said codebook vector have been entered and 

then the desired vector output from said zero- input response 

filter is processed out sample by sample for subtraction from said 

final vector f and 
n, 

20 for each input vector s n in a frame, finding the vector 

stored in said zero-state response codebook which best matches the 
vector v n , thereby finding the best match of a codebook vector 
with an Input vector, using an estimate vector i n produced from 
the best match codebook vector found for the preceding input 
vector, 

having found the best match of said vector v M with a zero- 

n 

state response vector in said zero-state response codebook for an 
input speech or audio vector 8 ft , transmit the zero-state response 
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» 

codebook Index of the current beat-match 2ero-state response 
vector as a compressed vector code of the current input vector, 
and also use said index of the current best -match zero-state 
response vector to select a vector from said permanent codebook 
for computing said long term pitch predicted input vector I n to be 
subtracted from the next input vector s n of the frame. 

According to another broad aspect of the invention there 
is provided a postf iltering method for enhancing digitally 
processed speech or audio signals comprising the steps of 
10 buffering said speech or audio signals into frames of vectors, 
each vector having K successive samples! 

performing analysis of said buffered frames of speech or 
audio signals In predetermined blocks to compute linear predictive 
coefficients, pitch and pitch predictor parameters, and 
filtering each vector with long-delay and short -delay 

v. 

postf iltering in cascade, said long-delay postf iltering being 
controlled by said pitch and pitch predictor parameters and said 
short -delay postf iltering being controlled by said linear 
predictive coefficient parameters, wherein postf Iltering is 
20 accomplished by using a transfer function for said short-delay 
postf liter of the form 
l-P(z/0) 

, 0 < 0 < a < 1 

l-P(z/a) 

where z is the inverse of the unit delay operator z" 1 used in the 
z transform representation of transfer functions, and a and 0 are 
fixed scaling factors. 
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Other modifications and variation to this Invention may 
occur to those skilled in the art, such as variable- f rame- rat e 
coding, fast codebook searching, reversal of the order of pitch 
prediction and LPC prediction, and use of alternative perceptual 
weighting techniques. Consequently, the claims which define the 
present invention are intended to encompass such modifications and 
variations. 

Although the purpose of this Invention is to encode for 
transmission and/or storage of analog speech or audio waveforms 
for subsequent reconstruction of the waveforms upon reproduction 
of the speech or audio program, reference is made hereinafter only 
to speech, but the Invention described and claimed is applicable 
to audio waveforms or to sub-band filtered speech or audio 
waveforms . 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. la is a blook diagram of a Vector Adap- 
tive Predictive Coding (VAPC) processor embodying the 
present invention, and FIG. 1b is a block diagram of 
a receiver for the encoded speech transmitted by the 
5 system of FIG. 1a. 

FIG. 2 is a schematic diagram that illustrates 
the adaptive computation of vectors for a zero-state 
response codebook in the system of FIG. la, 

FIG. 3 is a block diagram of an analysis proc- 
10 essor in the system of FIG. la. 

FIG. 4 Is a block diagram of an adaptive post • 
filter of FIG. 1b. 

FIG. 5 illustrates the IPC spectrum and the 
corresponding frequency response of an all-pole post- 
15 filter 1/[Np(z/ a)] for different values of a. The 
offset between adjacent plots is 20 dB. 

FIG, 6 illustrates the frequency responses of 
the postfiiter [1 w v1 ][1-P(s/|)]/CW(i/ a)] corre- 
sponding to the LPC spectrum shown In FIG. 5. In 
20 both plots, a-0.8 and 0.0.5. The offset b&tveen the 
two plots is 20 dB. 

DESCRIPTION OF PREFERRED EMBODIMENTS 

The preferred node of implementation contem- 
plates using programmable digital signal processing ' 
chips, such as one or two AT&T DSP32 chips, and aux- 

25 iliary chips for the necessary memory and controllers 
for such equipments as input sampling, buffering and 
multiplexing. Since the system is digital, It is 
synchronized throughout with the samples. For sim- 
plicity of illustration and explanation, the syn- 

30 chronizlng logic Is not shown in the drawings. Also 
for simplification, at each point where a signal 
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vector Is subtracted from another, the subtraction function is 
symbolically indicated by an adder represented by a plus sign 
within a circle. The vector being subtracted is on the input 
labeled with a minus sign. In practice, the two's complement of 
the subtrahend is formed and added to the minuend. However, 
although the preferred implementation contemplates programmable 
digital signal processors, it would be possible to design and 
fabricate special integrated circuits using VLSI techniques to 
implement the present invention as a special purpose, dedicated 
digital signal processor once the quantities needed would justify 
the initial cost of design. 

Referring to PIG. la, original speech samples in digital 
form from sampling analog-to-digital converter 10 are received by 

an analysis processor 11 which partitions them into vectors s of 

n 

K samples per vector, and into frames of N vectors per frame. The 
analysis processor stores the samples in a dual buffer memory 
which has the capacity for storing more than one frame of vectors, 
for example two frames of 8 vectors per frame, each vector 
consisting of 20 samples, so that the analysis processor may 
compute parameters used for coding the stored frame. As each 
frame is being processed out of one buffer, a new frame coming in 
is stored in the other buffer so that when processing of a frame 
has been completed, there is a new frame buffered and ready to be 
processed. 

The analysis processor 11 determines the parameters of 
filters employed in the Vector Adaptive Predictive Coding (VAPC) 
technique that is the subject of this invention. These parameters 
are transmitted through a multiplexer 12 as side information Just 
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ahead of the frame of vector codes generated with the uae of a 
permanent vector quantized (VQ) codebook 13 and a zero-state 
response (ZSR) codebook 14. The side Information conditions the 
receiver to properly filter decoded vectors of the frame. The 
analysis processor 11 also computes other parameters used in the 
encoding process. The latter are represented in Figure la by 
labeled lines, and consist of sets of parameters which are 
designated W for a perceptual weighting filter 18, a quantized LPC 
predictor QLPC for an LPC synthesis filter 15, and quantized pitch 
QP and pitch predictor QPP for a pitch synthesis filter 16. Also 
computed by the analysis processor is a scaling factor G that is 
quantized to QC for control of a scaling unit 17. The four 
quantized parameters transmitted as side information are encoded 
for transmission using a quantizing table as the quantized pitch 
index, pitch predictor index, LPC predictor index and gain index. 
The manner in which the analysis processor computes all of these 
parameters will be described with reference to PIG. 3. 

The multiplexer 12 preferably transmits the side 
information as soon as it is available, although it could follow 
the frame of encoded input vectors, and while that is being done, 
M zero-state response vectors are computed for the zero-state 
response (ZSR) codebook 14 in a manner illustrated in PIG. 2, 
which is to process each vector in the VQ codebook, 13 e.g., 128 
vectors, through a gain scaling unit 17 ', an LPC synthesis filter 
15', and perceptual weighting filters 18' corresponding to the 
gain scaling unit 17, the LPC synthesis filter 15, and perceptual 
weighting filter 18 in the transmitter (PIG. la). Ganged 
commutating switches 8- and 8 2 are shown to signify that each 
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fixed VQ vector processed Is stored In memory locations of the 
same Index (address) In the ZSR codebook. 

At the beginning of each codebook vector processing, the 
Initial conditions of the cascaded filters 15' and 18' are set to 
zero. This simulates what, the cascaded filters 15' and 18' will 
do with no previous vector present from Its corresponding VQ 
codebook. Thus, if the output of a zero- Input response filter 19 
in the transmitter (PIG. la) Is held or stored at each step of 
computing the VQ code Index (to transmit for each vector of a 
frame), it is possible to simplify encoding the speech vectors by 
subtracting the zero-state response output from the vector f R . In 
other words, assuming M-128, there are 128 different vectors 
permanently stored in the VQ codebook to use in coding the 
original speech vectors s n * Then every one of the 128 VQ vectors 
is read out in sequence, fed through the scaling unit 17', the LPC 
synthesis filter 15', and the perceptual weighting filter 18' 
shown in PIG. 2 without any history of previous vector inputs 
i.e., without any ringing due to excitation by a preceding vector 
by resetting those filters at each step. The resulting filter 
output vector is then stored in a corresponding location In the 
zero-state response codebook 14. Later, while encoding input 
signal vectors s n by finding the best match between a vector v n 
and all of the zero state response vector codes, it is necessary 
to subtract from a vector f R derived from the perceptual weighting 
filter a value that corresponds to the effect of the previously 
selected VQ vector. That Is done through the zero- input response 
filter 19. The Index (address) of the best match is used as the 
compressed vector code transmitted for the vector s n . Of the 128 
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zero-state response vectors, there will be only one that provides 
the best natch! I.e., least distortion. Assume It Is In location 
38 of the zero-state response codebook as determined by a computer 
20 labeled "compute norm." An address register 20a will store the 
index 38. It is that index that is then transmitted as a VQ index 
to the receiver shown in PIG. lb. 

In the receiver, a demultiplexer 21 separates the side 
information which conditions the receiver with the same parameters 
as corresponding filters and scaling unit of the transmitter. The 

10 receiver uses a decoder 22 to translate the parameter indices to 
parameter values. The VQ index for each successive vector in the 
frame addresses a VQ codebook 23 which is identical to the fixed 
VQ codebook 13 of the transmitter. The LPC synthesis filter 24, 
pitch synthesis filter 25, and scaling unit 26 are conditioned by 
the same parameters which were used in computing the zero-state 
codebook values, and which were in turn used in the process of 
selecting the encoding index for each input vector. At each step 
of finding and transmitting an encoding index, the zero- input 
response filter 19 computes from the VQ vector at the location of 

20 the index transmitted a value to be subtracted from the input 

vector f n to present a zero-input response to be used in the best- 
match search. 

There are various procedures that may be used to 
determined the best match for an input vector s n . The simplest is 
to store the resulting distortion between each zero-state response 
vectorcode output and the vector v n with the index of that zero- 
state response vector code. Assuming there are 128 vectorcodes 
stored in the codebook 14, there would then be 128 resulting 
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distortions stored In a computer 20. Then, after all have been 
stored, a search Is made in the computer 20 for the lowest 
distortion value. Its index (address) of that lowest distortion 
value is then stored In a register 20a and transmitted to the 
receiver as an encoded vector via the multiplexer 12, and to the 
VQ codebook for reading the corresponding VQ vector to be used in 
the processing of the next input vector s n . 

In summary, it should be noted that the VQ codebook Is 
used (accessed) in two different stepst first, to compute vector 
codes for the zero-state response codebook at the beginning of . 
each frame, using the LPC synthesis and perceptual weighting 
filter ^parameters determined for the frame* and second, to excite 
the filters 15 and 16 through the scaling unit 17 while searching 
for the index of the best-match vector, during which the estimate 
s n thus produced is subtracted from the input vector s n . The 
difference d R is used in the best -match search. 

As the best match for each input vector s n is found, the 
corresponding predetermined and fixed vector from the VQ codebook 
is used to reset the zero input response filter 19 for the next 
vector of the frame. The function of the zero-input response 
filter 19 Is thus to find the residual response of the gain 
scaling unit 17 ' and filters 15' and 18' to previously selected 
vectors from the VQ codebook. Thus, the selected vector is not 
transmlttedi only its index, is transmitted. At the receiver its 
index is used to read out the selected vector from a VQ codebook 
23 Identical to the VQ codebook 13 in the transmitter. 

The zero-input response filter 19 is the same filtering 
operation that is used to generate the ZSR codebook 14, namely the 
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combination of a gain G f an LPC synthesis filter and a weighting 
filter, as shown in PIG. 2. Once a best codebook vector match is 
determined, the best-match vector is applied as an input to this 
filter (sample by sample, sequentially). An input switch s ln is 
closed and an output switch s QUt is open during this time so that 
the first K output samples are ignored. (K is the dimension of 
the vector and a typical value of K is 20.) As soon as all K 
samples have been applied as inputs to the filter 19, the filter 
input switch s ln is opened and the output switch s QUt is closed. 

10 The next K samples of the vector f n , the output of the perceptual 
weighting filter, begin to arrive and are subtracted from the K 
samples of the codebook vector. The difference so generated is a 
set of K samples forming the vector v n which is stored in a static 
register for use in the ZSR codebook search procedure. In the ZSR 
codebook search procedure, the vector v n is subtracted from each 
vector stored in the ZSR codebook, and the difference vector A is 
fed to the computer 20 together with the index (or stored in the 
same order, thereby to imply the index of the vector out of the 
ZSR codebook) . The computer 20 then determines which difference 

20 is the smallest, i.e., which is the best match between the vector 
v n and each vector stored temporarily (for one frame of input 
vectors b r ) . The index of that best-match vector is stored in a 
register 20a. That index is transmitted as a vectorcode and used 
to address the VQ codebook to read the vector stored there into 
the scaling unit 17, as noted above. This search process is 
repeated for each vector in the ZSR codebook, each time using the 
same vector v n . Then the best vector is determined. 

Referring now to PIG. lb, it should be noted that the 
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output of the VQ codebook 23, which precisely duplicates the VQ 
codebook 13 of the transmitter, Is Identical to the vector 
extracted from the best-match Index applied as an address to the 
VQ codebook 13? the gain unit 26 Is identical to the gain unit 17 
In the transmitter, and filters 24 and 25 exactly duplicate the 
filters 15 and 16, respectively, except that at the receiver, the 
approximation s n rather than the prediction $ n is taken as the 
output of the pitch synthesis filter 25. The result, after 
converting from digital to analog form, Is synthesized speech that 
reproduces the original speech with very good quality. 

It has been found that by applying an adaptive 
post filter 30 to the synthesized speech before converting It from 
digital to analog form, the perceived coding noise may be greatly 
reduced without Introducing significant distortion in the filtered 
speech. FIG. 4 Illustrates the organization of the adaptive 
post filter as a long-delay filter 31 and a short -delay filter 32. 
Both filters are adaptive in that the parameters used in them are 
those received as side information from the transmitter, except 
for the gain parameter, G. The basic idea of adaptive 
postf iltering is to attenuate the frequency components of the 
coded speech in spectral valley regions. At low bit rates, a 
considerable amount of perceived coding noise comes from spectral 
valley regions where there are no strong resonances to mask the 
noise. The postfllter attenuates the noise components in spectral 
valley regions to make the coding noise less perceivable. 
However, such filtering operation Inevitably Introduces some 
distortion to the shape of the speech spectrum. Fortunately, our 
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ears are not very sensitive to distortion in spectral valley 
regions} therefore, adaptive postfllterlng only Introduces 
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very slight distortion in 'perceived speech, but it 
significantly reduoes the perceived noise level. The 
adaptive postfilter will be described in greater 
detail after first describing in more detail the 
5 analysis of a frame of vectors to determine the side 
information. 

Referring now to PIG. 3. it shows the organi- 
zation of the initial analysis of block 11 in FIG. 
la: The input speech samples s n are first stored in a 
10 buffer 40 capable of storing, for example, more than 
one frame of 8 veotors, each vector having 20 sam- 
ples. 

Once a frame of input vectors s n has been 
stored, the parameters to be used, and their indices 

IS to be transmitted as side information, are determined 
from that frame and at least a part of the previous 
frame in order to perform analysis with information 
from more than the frame of interest. The analysis 
is carried out as shown using a pitch detector 41, 

20 pitch quantizer 42 and a pitch predictor coefficient 
quantizer 43. What is referred to as "pitch" applies 
to any observed periodicity in the input signal, 
which may not necessarily correspond to the classical 
use of "pitch 11 corresponding to vibrations in the 

25 human vocal folds. The direct output of the .speech 
is also used in the pitch predictor coefficient quan- 
tizer 43. The quantized pitch (QP) and quantized 
pitch predictor (QPP) are used to compute a pitch- 
prediction residual in block 44, and as control pa- 

30 rameters for the pitch synthesis filter 16 used as a 
predictor in FIG. 1a. Only a pitch index and a pitch 
prediction index are Included in the side information 
to minimize the number of bits transmitted. At the 
receiver, the decoder 22 will use each index to pro- 
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duce the corresponding control parameters for the 
pitch synthesis filter 25. 

* The pitch-prediction residual is stored, in a 
buffer 45 for LPC analysis in block 46. The LPC 
5 predictor from the LPC analysis is quantized in block 
47. The index of the Quantized LPC predictor Is 
transmitted as a third one of four pieces' of side 
information, while the quantized LPC predictor is 
used as a parameter for control of the LPC synthesis 
10 filter 15, and in block 48 to compute the rms value 
of the LPC predictive residual. This value (unquan- 
tized residual gain) is then quantized in block 49 to 
provide gain control G in the scaling unit 17 of FIG. 
la. The index of the quantized residual gain is the 
15 fourth part of the side information transmitted. 

In addition to the foregoing, the analysis 
section provides LPC analysis In block 50 to produce 
an LPC predictor from which the set of parameters W 
for the perceptual weighting, filter 18 (FIG. 1a) is 
20 computed in block 51. 

The adaptive postfilter 30 in FIG. lb will now 
be described with reference to FIG. 4. it consists 
of a long-delay filter 31 and a short-delay filter 32 
in cascade. The long-delay filter is derived from 
25 the decoded pitch^predictor information available at 
the receiver. It attenuates frequency components 
between pitch harmonic frequencies. The short-delay 
filter is derived from LPC predictor information, and 
it attenuates the frequency components between for- 
30 mant frequencies. 

The noise masking effect of human auditory 
perception, recognized by M.R. Schroeder, B,S. Atal, 
and J.L. Hall, "Optimizing Digital Speech Coders by 
Exploiting Masking Properties of the Human Ear," J. 
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Acoust. Soc. Am. f Vol. 66 f No. 6, pp. 1647-1652, 
December 1979. la exploited in VAPC by using noise 
spectral shaping. However, in noise spectral shaping, 
lowering noise components at certain frequencies can 
5 only be achieved at the price of increased noise 
components at other frequencies. [B.S. Atal and M.R. 
Schroeder, "Predictive Coding of Speech Signals and 
Subjective Error Criteria," IEEE Trans. Acoust., 
Speech, and Signal Processing, Vol. ASSP-27, No. 3, 
10 PP. 2^7-25<l, June 1979] Therefore, at bit rates as 
low as 1800 bps, where the average noise level is 
quite high, it is very difficult, if not impossible, 
to force noise below the masking threshold at all 
frequencies. Since speech formants are much more 
15 important to perception than spectral valley*, the 
approach of the present invention is to preserve the 
formant information by keeping the noise in the for- 
mant regions as low as is practical during encoding. 
Of course, in this case, the noise components In 
20 spectral valleys may exceed the threshold; however, 
these noise components can be attenuated later by the 
postfilter 30. In performing such postf iltering, the 
speech components In spectral valleys will also be 
attenuated. Fortunately, the limen, or "Just notice- 
25 able difference," for the intensity of spectral val- 
leys can be quite large [J.L. Flanagan, Speech 
Analysis, Synthesis, and Perception, Academic Press, 
New York, 1972]. Therefore, by attenuating the compo- 
nents in spectral valleys, the postfilter only intro- 
30 duces minimal distortion in the speech signal, but it 
achieves a substantial noise reduction. 

Adaptive postfiltering has been used success- 
fully in enhancing ADPCM-coded speech. See V. 
Ramaraoorthy and J.S. Jayant, "Enhancement of A DP CM 
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Speech by Adaptive Postf iftering," AT&T Bell Labs 
Tech. J. f pp. 1H65-W5, October 1984; and N.S. 
Jayant and V. Ramamoorthy, "Adaptive Postf iltering of 
16 kb/s-ADPCM Speech, 0 Proc. ICASSP, pp. 829-832, 
5 Tokyo, Japan. April 1986. The postfilter used by 
Ramamoorthy, et al., aupra, is derived from the two- 
pole six-zero ADPCM synthesis filter by moving the 
poles and zeros radially toward the origin. If this 
idea is extended^ directly to an all-pole LPC synthe- 

10 sis filter 1/Cl-P(z)] f the result is I/[1-P(z/a)J as 
the corresponding postfilter, where 0<a<1. Such an 
all-pole postfilter indeed reduces the perceived 
noise level; however, sufficient noise reduotion can 
only be achieved with severe muffling in the filtered 

15 speech. This is due to the fact that the frequency 
response of this all-pole postfilter generally has a 
lowpass spectral tilt for voiced speech. 

^ The spectral tilt of the all-pole postfilter 
1/[1-P(z/a)] can be easily reduced by adding zeros 

20 having the same phase angles as the poles but with 
smaller radii. The transfer function of the result- 
ing pole-zero postfilter 32a has the form 

H(z) - z f o<a<o<1 (1) 

1-P(z/a) 

where a and 6 are coefficients empirically deter- 
25 mined, with some tradeoff between spectral peaks 
being so sharp as to produce ohirping and being so 
low as to not achieve any noise reduction. The fre- 
quency response of H(z) can be expressed as 



20 log|H(eJ*)| . 20 log - 

|1-P(e^ w /o)| 
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- 20 log _ ( 2 ) 

|1-P(e>/B)| 

Therefore, in logarithmic scale, the frequency re- 
sponse of the pole-zero postfilter H(z) is simply the 
difference between the frequency responses of two 
5 all-pole postfilters. 

Typical values of a and 6 are 0.8 and 0.5, 
respectively. Prom FIG. 5, it is seen that the re- 
sponse for c-0.8 has both formant peaks and speotral 
tilt, while the response for a-0.5 has speotral tilt 
10 only. Thus, with a-0.8 and p-0.5 in Equation 2, we 
. can at least partially remove the spectral tilt by 
subtracting the response for o-0.5 from the response 
for c-0.8. The resulting frequency response of H(z) 
is shown in the upper plot of PIG. 6. 
15 In informal listening tests, it has been found 

that the muffling effect was significantly reduced 
after the numerator term [1-P(z/$)] was included in 
the transfer function H(z). However, the filtered 
speech remained slightly muffled^ even with the spec- 
20 tral-tilt compensating term [1-P(z/e)]. To further 
reduce the muffling effect, a first-order filter 32b 
was added which has a transfer function of [1-nz" 1 ], 
where w is typically 0.5. Such a filter provides a 
slightly highpassed spectral tilt and thus helps to 
25 reduce muffling. This first-order filter is used in 
cascade with H(z), and a combined frequency response 
with w-0.5 is shown In the lower plot of PIG. 6. 

The short-delay postfilter 32 Just described 
basically amplifies speech formants and. attenuates 
30 Inter-formant valleys. To obtain the ideal post- 
filter frequency response, we also have to amplify 
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the pitch harmonica and attenuate the valleys between 
harmonics. Such a characteristic of frequency re* 
sponse can be achieved with a long-delay postfilter 
using the information in the pitch predictor. 
5 In VAPC, we use a three- tap pitch predictor; 

the pitch synthesis filter corresponding to such a 
pitch predictor is not guaranteed to be stable. Sinoe 
the poles of such a synthesis filter may be outside 
the unit circle, moving the poles toward the origin 

10 may not have the same effect as in a. stable IPC syn- 
thesis filter. Even if the three-tap pitch synthesis 
filter is stabilized, its frequency response may have 
an undesirable spectral tilt. Thus, it is not suita- 
ble to obtain the long-delay postfilter by scaling 

15 down the three tap weights of the pitch synthesis 
filter. 

With both poles and zeroes, the long-delay 
postfilter can be chosen as 



20 



H 1 (Z) * °S HfT (3) 

where p is determined by pitch analysis, and C g is an 
adaptive scaling factor. 

Knowing the information provided by a single 
or three- tap pitch predictor as the value b 2 or the 
sum of b^bg'bj, the factors Y and k are determined 
according to the following formulas: 



* • C z f( X ), X . CpfU), 0 < C z . C p < 1 (A) 

where 
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1 If X > 1 
f(x) - x if U th i x i 1 ( 5 ) 
0 if x < U th 

where U^is a threshold value (typically 0.6) deter- 
mined empirically, and x can be either b 2 or b^b^bj 
depending on whether a one-tap or a three- tap pitch 
predictor is used. Since a quantized three-tap pitch 
predictor is preferred and therefore already availa- 
ble at the VAPC receiver, x is chosen as 

3 

i-i 

in VAPC postfiltering. On the other hand, if the 
postfilter is used elsewhere to enhance noisy input 
speech, a separate pitch analysis is needed, and x 
may be chosen as a single value b 2 since a one-tap 
pitch predictor suffices. (The value b 2 when used 
alone indicates a value from a single-tap predictor, 
15 which in practice would be the same as a three-tap 
predictor when bj and bj are set to zero.) 

The goal is to make the power of ly(n)) about 
the same as that of (s(n)). An appropriate scaling 
factor is chosen as 



20 c. - 1 ' x/x m 

* 1 ♦ Y/x (6) 

The first-order filter 32b can also be made 
adaptive to better track the change in the spectral 
tilt of H(z). However, it has been found that even a 
fixed filter with u-0.5 gives quite satisfactory 
25 results. A fixed value of » may be determined em- 
pirically. 
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To avoid occasional large gain excursions, an automatic 
gain control (AGC) was added at the output of the adaptive 
postf liter. The purpose of AGC Is to scale the enhanced speech 
such that It has roughly the same power as the unf lltered noisy 
Bpeech. It Is comprised of a gain (square root of power) 
estimator 33 operating on the speech input s n , a gain (square root 
of power) estimator 34 operating on the postf lltered output r(n) f 
and a circuit 35 to compute a scaling factor as the ratios of the 
two gains. The postf llterlng output r(n) Is then multiplied by 
this ratio In a multiplier 36. AGC Is thus achieved by estimating 
the square root of the power of the unf lltered and filtered speech 
separately and then using the ratio of the two values as the 
scaling factor. Let {s(n)> be the sequence of either unf lltered 
or filtered speech samplesi then, the speech power o 2 (n) Is 
estimated by using 

o 2 (n)-?o 2 (n-l)+(l- S)s 2 (n), (KM. (7) 

A suitable value of T Is 0.99. 

The complexity of the postfllter described In this 
section is only a small fraction of the overall complexity of the 
rest of the VAPC system, or any other coding system that may be 
used. In simulations, this postfllter achieves significant noise 
reduction with almost negligible distortion In speech. To test 
for possible distorting effects, the adaptive postf llterlng 
operation was applied to clean, uncoded speech and It was found 
that the unf lltered original and Its filtered version sound 
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essentially the same, Indicating, that the distortion Introduced by 
this postfllter Is negligible. 
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It should be noted A that although this novel 
postf iltering technique was developed for use with 
the present invention, its applications are not re- 
stricted to use with it. In fact, this technique can 
5 be used not only to enhance the quality of any noisy 
digital speech signal but also to enhance the decoded 
speech of other speech coders when provided with a 
buffer and analysis section for determining the pa- 
rameters. 

10 What has been disclosed is a real-time Vector 

Adaptive Predictive Coder (VAPC) for speech or audio 
which may be implemented with software using the 
commercially available AT&T DSP32 digital processing 
chip. In its newest version, this chip has a proc« 

15 esslng power of 6 million instructions per second 
(NIPS). To facilitate implementation for real-time, 
speech coding, a simplified version of the 4600 bps 
VAPC is available. This simplified version has a 
much lower complexity, but gives nearly the same 

20 ■ speech quality as a full complexity version. 

In the real-time implementation, an inner- 
product approach is used for computing the norm 
(smallest distortion) which is more efficient than 
the conventional difference-square approach of com- 

25 puting the mean square error (MSE) distortion. Given 
a test vector v and H ZSR codebook vectors, z J( 
J. 1,2, . . .,M, the J-th MSE distortion can be com- 
puted as 

j 

| v-ij f . | * |2 - 2 [ v T Zj . | Zj ,2 j (8) 

30 At the beginning of each frame, it la possible to 
compute and atore 1/2Q Zj fl* . With the DSP32 proc- 
eaaor and for the dimension and codebook site used, 
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the. difference-square approach of the codebook search 
requires about 2.5 MIPS to implement, while the in- 
ner-product approach only requires about 1.5 MIPS. 

The complexity of the VAPC is only about 3 
5 million oultlply-adds/second and 6 k words of data 
memory. However, due to the overhead in implementa- 
tion, a single DSP32 chip was not sufficient for im- 
plementing the coder. Therefore, two DSP32 chips 
were used to implement the VAPC. tfith a faster DSP32 
10 chip now available, which has an Instruction cycle 
time of 160 ns rather than 250 ns, it is expected 
that the VAPC can be implemented using only one DSP32 . 
chip. 
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THE EMBODIMENTS OP THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS t 



1. An improvement in the method for compressing digitally 

encoded input speech or audio vectors at a transmitter by using a 

scaling unit controlled by a quantized residual gain factor QG, a 

synthesis filter controlled by a set of quantized linear 

protective coefficient parameters QLPC, a pitch predictor 

controlled by pitch and pitch predictor parameters QP and QPP, a 

weighting filter controlled by a set of perceptual weighting 

parameters W, and a permanent indexed codebook containing a 

predetermined number M of codebook vectors, each having an 

assigned codebook index, to find an index which identifies the 

best match between an input speech or audio vector s that is to 

n 

be coded and a synthesized vector i n generated from a stored 
vector in said indexed codebook, wherein each of said digitally 
encoded input vectors consists of a predetermined number K of 
digitally coded samples, comprising the steps of 

buffering and grouping said input speech or audio vectors 
into frames of vectors with a predetermined number N of vectors in 
each frame, 

performing an initial analysis for each successive frame, 
said analysis including the computation of a residual gain factor 
G, a set of perceptual weighting parameters W, a pitch parameter 
P, a pitch predictor parameter PP, and a set of said linear 
predictive coefficient parameters LPC, and the computation of 
quantized values QG, QP, QPP and QLPC of parameters G, P, PP and 
LPC using one or more indexed quantizing tables for the 
computation of each quantized parameter or set of parameters 
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for each frame transmitting Indices of said quantized 
parameters QG, QP, QPP and QLPC determined in the initial analysis 
step as side information about vectors analyzed for later use in 
looking up in one or more identical tables said quantized 
parameters QG, QP, QPP and QLPC while reconstructing speech and 
audio vectors from encoded vectors in a frame, where each index 
for a quantized parameter points to a location in one or more of 
said identical tables where said quantized parameter may be found, 
computing a zero-state response vector from the vector output 
of a cascaded filter comprising a scaling unit, synthesis filter 
and weighting filter identical in operation to said scaling unit, 
synthesis filter and weighting filter used for encoding said input 
vectors, said zero-state response vector being computed for each 
vector in said permanent codebook by first setting to zero the 
initial condition of said cascaded filter so that the response 
computed is not influenced by a preceding one of said codebook 
vectors processed by said cascaded filter, and then using said 
quantized values of said residual gain factor, set of linear 
predictive coefficient parameters, and said set of perceptual 
weighting parameters computed in said initial analysis step by 
processing each vector in said permanent codebook through said 
zero-input response filter to compute a zero-state response 
vector, and storing each zero-state response vector computed in a 
zero-state response codebook at or together with an index 
corresponding to the index of said vector in said permanent 
codebook used for this zero-state response computation step, and 
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after thus performing an Initial analysis of and computing a 
zero-state response codebook for each successive frame of input 
speech or audio vectors, encode each input vector s n of a frame in 
sequence by transmitting the codebook index of the vector in said 
permanent codebook which corresponds to the index of a zero-state 
response vector in said zero-state response codebook that best 
matches a vector v R obtained from an input vector s n by 

subtracting a long term pitch prediction vector I n from the 
input vector s n to produce a difference vector d ft and filtering 
said difference vector d n by said perceptual weighting filter to 
produce a final input vector f n , where said long term pitch 
prediction s n is computed by taking a vector from said permanent 
codebook at the address specified by the preceding particular 
index transmitted as a compressed vector code and performing gain 
scaling of this vector using said quantized gain factor QG, then 
synthesis filtering the vector obtained from said scaling using 
said quantized values QLPC of said set of linear predictive 
coefficient parameters to obtain a vector d n and from vector d n 
producing a long term pitch predicted vector s n of the next input 
vector s n through a pitch synthesis filter using said quantized 
values of pitch predictor parameters QP and QPP, said long term 
prediction vector s n being a prediction of the next input vector 

producing said vector v n by subtracting from said final input 
vector f n the vector output of said zero- input response filter 
generated in response to a permanent codebook vector at the 
codebook address of the last transmitted index code, said vector 
output being generated by processing through said zero input 
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response filter, said permanent codebook vector located at said 
last transmitted index code where the output of said zero input 
response filter is discarded while said permanent codebook vector 
located at said last transmitted index code is being processed 
sample by sample in sequence into said zero input response filter 
until all samples of said codebook vector have been entered, and 
where the input of said zero input response filter is interrupted 
after all samples of said codebook vector have been entered and 
then the desired vector output from said 2ero-input response 
filter is processed out sample by sample for subtraction from said 

final vector f and 
n, 

for each input vector s n in a frame, finding the vector 
stored in said zero-state response codebook which best matches the 
vector v n , thereby finding the best match of a codebook vector 
with an input vector, using an estimate vector I n produced from 
the best match codebook vector found for the preceding input 
vector, 

having found the best match of said vector v with a zero- 

n 

state response vector in said zero-state response codebook for an 

input speech or audio vector s n , transmit the zero-state response 

codebook index of the current best -match zero-state response 

vector as a compressed vector code of the current input vector, 

and also use said index of the current best-match zero-state 

response vector to select a vector from said permanent codebook 

for computing said long term pitch predicted input vector s to be 

n 

subtracted from the next input vector s of the frame. 



2. 



An improvement as defined in claim 1, including a method 
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for reconstructing said input speech or audio vectors from index 

coded vectors at a receiver, comprised of decoding said side 

information transmitted for each frame of index coded vectors, 

using the indices received to address a permanent codebook 

identical to said permanent codebook in said transmitter to 

successively obtain decoded vectors, scaling said decoded vectors 

by said quantized gain factor QG, and performing synthesis 

filtering using said set of linear predictive coefficient 

parameters and pitch synthesis filtering using said quantized 

pitch parameters QP and QPP to produce approximation vectors s of 

n 

the original signal vectors a . 

n 

3. An improvement as defined in claim 2 wherein said 
receiver includes postfiltering of said approximation vectors b 

n 

by long-delay postfiltering and short -delay poBtf iltering in 
cascade, said quantized pitch and quantized pitch predictor 
parameters controlling said long-term postfiltering and said 
quantized linear predictive coefficient parameters controlling 
said short-term postfiltering, whereby adaptive postfiltered 
digitally encoded speech or audio vectors are provided. 

4. An Improvement as defined in claim 3 including automatic 
gain control of the adaptive postfiltered digitally encoded speech 
or audio signal is provided by estimating the square root of the 
power of said postfiltered speech or audio signal to obtain a 
value a 2 (n) of said postfiltered speech or audio signal and 
estimating the square root of the power of a postfiltering input 
speech or audio signal input to obtain a value a, (n) of decoded 
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input speech or audio vectors before postfiltering, and 
controlling the gain of the postfiltered speech or audio output 
signal by a scaling factor that is a ratio of o 1 (n) to a 2 (n). 

5. An improvement as defined in claim 4 wherein said 
quantized gain factor, quantized pitch and quantized pitch 
predictor parameters, and quantized linear predictive coefficient 
parameters are derived from said side information transmitted to 
said receiver. 

6. An improvement as defined in claim 3 wherein 
postfiltering is accomplished by using a transfer function for 
said long-delay postfilter of the form 

H l< z >" c g l+Yz~ P C g t l- Vx 
1 -}z~ p 1 * Vx 

where C g Is an adaptive scaling factor, p Is the quantized value 
QP of the pitch parameter P, and the factors Y and > are 
determined according to the following formulas 
7- C 2 f(x), C p f(x), 0 < c z , C p < 1 

where C z and C p are fixed scaling factors, 

1 If x > 1 
f (x) - x If U th s x s 1 

0 if x < u th 

U th is an unvoiced threshold value, and x is a voicing 
lndictor parameter that is a function of coefficients b 1# b~ and 
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b 3 , where t> 1# b 2 , b 3 are coefficients of said quantized pitch 
predictor QPP given by P^z) = l-b 1 2* p * l -b 2 2" p -b 3 z" p " 1 where z is 
the inverse of the input delay operation x" 1 used in the 
z transform representation of transfer functions. 

7. An improvement as defined in claim 6 wherein 
postfilterlng is accomplished by using a transfer function for 
said short-delay postf liter of the form 

l-P(z/p) 

, 0 < 0 < a < 1 

l-P(z/a) 

where a and j3 are bandwidth expansion coefficients. 

8. An improvement as defined in claim 7 wherein 
postfiltering further includes in cascade first-order filtering 
with a transfer function 

1-jjz" 1 , p < 1 
where y is a coefficient. 

9. A postf iltering method for enhancing digitally processed 
speech or audio signals comprising the steps of buffering said 
speech or audio signals into frames of vectors, each vector having 
K successive samples, 

performing analysis of said buffered frames of speech or 
audio signals in predetermined blocks to compute linear predictive 
coefficients, pitch and pitch predictor parameters, and 

filtering each vector with long-delay and short-delay 
postfiltering in cascade, said long-delay postfiltering being 
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controlled by said pitch and pitch predictor parameters and said 
short-delay post filtering being controlled by said linear 
predictive coefficient parameters, wherein post filtering is 
accomplished by using a transfer function for said short-delay 
postfllter of the form 
1-PU/p) 

, 0 < 0 < a < 1 

l-P(2/a) 

where z Is the inverse of the unit delay operator z* 1 used in the 
z transform representation of transfer functions, and a and 0 are 
fixed scaling factors. 

10. A postf iitering method as defined in claim 9 including 
automatic gain control of the post filtered digitally encoded 
speech or audio signal provided by estimating the square root of 
the power of said postf iltered digitally encoded speech or audio 
signal to obtain a value c 2 (n) of said postf iltered speech signal 
and estimating the square root of the power of a postf iitering 
input speech or audio signal to obtain a value o 1 (n) of decoded 
input speech or audio signal before postf iitering, and controlling 
the gain of the postfiltered speech or audio signal by a scaling 
factor that is a ratio of o^n) or o 2 (n). 

11. A postf iitering method as defined in claim 10 wherein 
postfiltering is accomplished by using a transfer function for 
Bald long-delay postfllter of the form 
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where C g is an adaptive scaling factor, p is the quantized value 
of the pitch parameter QP and the factors Y and ^ are adaptive 
bandwidth expansion parameters determined according to the 
following formulas 



where C z and C p are fixed scaling factors and 

1 if x > 1 
f(x)» x if U th £ x i 1 

Oifx<U th 

U th is an unvoiced threshold value, and x is a voicing indicator 
that is a function of coefficients b 1# b 2 , b 3 where & 1# b 2 b 3 are 

coefficients of said quantized pitch predictor QPP given by P^z) 

^7 1-b 2 -p+l~ b Z -Pl b Z **P-1 where" z is the inverse of the input 
1 2 ,3 

delay operation z used in the z transform representation of 
transfer functions* 

12. A postf lltering method as defined in claim 11 wherein 

postf lltering further includes in cascade first-order filtering 
with a transfer function 

1-jjz" 1 ,M<1 
where y is a coefficient. 



Y« C z f(x), > = C p f(x), 0 < c 2 , C p < l 
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ABSTRACT 

Disclosed is an apparatus and method to encode 
in real time analog speech or audio waveforms into a 
compressed bit stream for storage and/or transmission, and 
subsequent reconstruction of the wave form for reproduction. 
Also disclosed is an apparatus and method to provide adaptive 
post-filtering of a speech or audio signal that has been 
corrupted by noise resulting from a coding system or other 
sources of degradation so as to enhance the perceived quality 
of the speech or audio signal. The invention combines the 
power of Vector Quantization (VQ) and Adaptive Predictive 
Coding (APC) by providing a Vector Adaptive Predictive Codes 
(VAPC) which provides high-quality speech at bit rates 
between 4.8 and 9.6 kb/s f thus bridging the gap between 
scaler coders and VQ coders. 
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